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Our  development  of  an  ideal -observer  framework  and  a  test-pedestal  methodology  for  modeling  vision 
without  the  numerous  assumptions  of  previous  models  has  provided  a  comprehensive  understanding 
of  the  spatio-temporal  characteristics  of  human  vision.  The  methodology  encompasses  a  limited  set  of 
test  stimuli  with  a  multiplicity  of  pedestals  to  facilitate  the  comparison  of  performance  across  many 
psychophysical  tasks.  For  example,  it  is  shown  that  vernier  acuity  can  generally  be  predicted  from  an 
individual's  contrast  discrimination  threshold.  For  the  conditions  under  which  contrast  discrimination 
predictions  break  down,  a  detailed  modeling  of  later  stages  of  visual  processing  is  required.  As  a 
result,  specifications  for  a  vision  modeling  tool  have  been  developed  to  guide  the  creation  of  a 
comprehensive  vision  modeling  environment.  As  our  models  of  visual  function  have  matured,  we 
have  applied  them  to  practical  issues  such  as  image  compression  and  image  quality.  Consideration  of 
properties  of  human  vision  is  essential  if  the  image  compression  needed  for  new  technologies  such  as 
HDTV  are  to  avoid  sacrificing  image  quality.  The  success  of  the  test-pedestal  methodology  has  also 
lead  us  to  record  human  visual  evoked  potentials  so  that  we  may  integrate  our  psychophysical  data  and 
models  of  vision  with  underlying  physiological  mechanisms. 


vision  models,  human  vision,  image  compression,  image  quality 
evoked  potentials,  ideal  observer 


Cu 


c 


Progress  report  for  AFOSR-89-0238 


February  5,  1992 


Title:  Spatio-temporal  Masking:  Hyperacuity  and  Local  Adaptation. 

Period  covered:  January  1,  1990  -  December  31,  1991 

Objectives  of  the  research  effort. 

a.  Develop  robust  "ideal-observer"  methods  for  modeling  spatial  vision  with  fewer  assumptions  than 
previous  models.  The  proposed  test-pedestal  framework  for  spatio-temporal  interaction  uses  the 
same  test  pattern  with  multiple  pedestals  in  different  phase  relationships  to  learn  about  the  propenies 
of  the  underlying  mechanisms. 

b.  As  part  of  our  work  on  improving  models  of  spatial  vision  1  have  been  meeting  with  a  group  of 
Berkeley  faculty  to  organize  a  major  effort  to  develop  a  user-friendly  modeling  environment. 

c.  An  important  direction  of  our  Air  Force  research  is  to  take  our  models  of  human  vision  and  apply 
them  to  several  areas  of  applied  image  processing  (image  compression  and  image  quality),  a  timely 
topic  with  the  coming  of  high  definition  television  and  other  digital  image  technologies. 

d.  We  desire  to  connect  our  psychophysics  research  to  the  underlying  physiological  mechanisms.  In 
this  regard  we  have  found  that  new  techniques  are  needed  to  learn  about  nonlinear  visual  processing 
so  we  have  also  devoted  a  significant  effort  to  improving  methods  for  the  nonlinear  analysis  and 
source  localization  of  visual  evoked  potentials  and  other  biopotentials. 

Summary  of  Research  Effort. 

The  past  two  years  have  been  very  productive  for  my  research  group.  Based  on  AFOSR 
support  we  have  15  papers  either  published  or  in  press.  I  am  including  one  copy  of  each  paper  with 
this  report.  If  I  were  to  summarize  the  work  done  on  each  paper  this  document  would  become 
excessively  lengthy.  Instead,  this  summary  will  be  restricted  to  the  following:  1)  A  general  summary 
of  our  research  on  connecting  the  insights  from  human  vision  to  applied  topics  (image  compression 
and  image  quality),  2)  A  more  detailed  summary  of  our  work  on  vernier  acuity  to  show  our  approach 
to  modeling.  An  outgrowth  of  this  research  is  our  collaboration  with  other  Berkeley  faculty  to  develop 
a  user-friendly  environment  for  modeling  vision.  3)  A  summary  of  our  work  measuring  biopotentials. 
In  order  to  connect  our  psychophysical  data  and  models  to  underlying  physiological  mechanisms  we 
have  executed  several  visual  evoked  potential  studies  and  have  developed  new  methodologies  for 
studying  nonlinear  physiological  processing. 

1)  Image  compression  and  image  quality.  Seven  of  the  papers  that  have  been  written  during 
the  two  year  period  of  this  report  are  connected  with  the  assessment  of  image  quality.  The  field  of 
image  compression  is  growing  very  rapidly  becau.se  of  the  arrival  of  HDTV,  teleconferencing  and 
picture  phone  and  also  because  of  developing  standards  by  committees  such  as  JPEG  and  MPEG. 

The  standiu'ds  developed  by  JPEG  and  MPEG  are  ideal  for  the  vision  research  community.  The 
quantization  algorithm  is  based  squarely  on  properties  of  the  human  visual  system.  What  is  needed  is 
much  more  intomiation  from  reseiu'chers  in  human  vision  on  how  to  do  context  dependent  image 
compression  and  image  quality  evaluation.  That  is,  different  types  of  image  degradation  will  be  more 
or  less  visible  depending  on  the  local  context  of  the  image.  This  is  the  topic  of  visual  masking.  One  of 
the  directions  of  both  our  basic  research  and  our  applied  research  is  to  show  that  the  calculation  of 
masking  magnitude  is  more  difficult  than  commonly  believed.  We  are  able  to  demonstrate  situations 
with  strong  pedestals  in  both  in  space  and  in  time  where  the  masking  is  minimal.  In  particular  as 
shown  in  (Hu,  Klein  &  Carney,  reference  #14)  there  the  masking  of  a  thin  line  by  a  uniform  Bash  is 
ver\'  lociilized  in  time  and  limited  in  strength.  Based  on  these  results  we  have  calculated  that  the 
human  visual  system  is  able  to  take  in  much  more  infomiation  than  is  transmitted  by  nomial 
compression  schemes.  We  believe  that  our  rcsetu’ch  is  relevant  whenever  the  goal  is  to  achieve 
perceptually  lossless  compression. 

We  would  also  like  to  highlight  the  paper  by  Klein  &  Beutter  (1992,  reference  #7)  because  it 
has  caused  a  stir  in  several  fields.  In  image  compression  (and  image  processing  in  general)  one  wants 
to  use  lilters  that  are  localized  in  both  space  and  spatial  frequency.  Gabor  once  made  a  claim  that  for 
real-valued  functions,  the  Hemiite  functions  (Hennite  polynomials  times  a  Gaussian)  minimize  the 


page  1 


92-05636 


92  3  03  13» 


joint  space-spatial  frequency  uncertainty.  What  we  showed  was  that  for  the  class  of  functions  that  are 
an  order  polynomial  times  a  Gaussian,  the  Hemiite  functions  maximize  the  joint  uncenainty. 
Individuiils  from  different  disciplines  have  indicated  the  usefulness  of  this  result  (e.g.  1  have  been 
informed  that  Hermite  funf'tions  characterize  the  profile  of  a  laser  beams  and  based  on  Gabor's  work 
it  had  been  thought  that  this  was  a  good  profile). 


2)  Vernier  acuity  and  modeling.  Using  the  test-pedestal  paradigm,  we  initially  established  a 
connection  between  vernier  acuity  and  contrast  discrimination  using  sinusoidal  stimuli  (Hu,  Klein  & 
Carney,  1992,  reference  #16).  Rather  than  using  a  complex  model  with  many  assumptions  we 
showed  that  to  first  order  vernier  acuity  was  w'ell  predicted  by  contrast  discrimination  thresholds 
when  both  tasks  are  expressed  in  the  same  contrast  units.  These  results  can  be  explained  by  both  tasks 
using  common  underlying  mechanisms.  However  several  notable  exceptions  were  evident,  at  high 
spatial  frequencies  vernier  thresholds  degraded  faster  than  expected  from  contrast  discTimination  data. 
Moreover,  the  tvi  slope  was  always  shallower  for  the  vernier  task.  We  have  been  exploring  other 
stimulus  configurations  such  as  different  line  lengths  and  central  gaps  in  both  tasks  to  determine  the 
source  of  the  deviations  from  predictions  based  on  contrast  discrimination.  At  high  spatial 
frequencies,  decreasing  grating  length  hun  contra.st  discrimination  threholds  but  actually  improved 
vernier  acuity.  The  vernier  task  is  presumably  performed  by  oriented  mechanisms,  the  long  grating 
possibly  diluted  or  smeared  out  the  orientation  cue  which  reduced  the  effectiveness  of  oriented 
mechanisms.  The  difference  in  tvi  slope  is  likely  due  to  the  use  of  oriented  mechanisms.  If  vernier 
acuity  involves  mechanisms  oriented  awtiy  fomt  the  pedestal  orientation  it  would  avoid  some  of  the 
contrast  masking  effects  observed  for  the  contrast  di.scrimination  task,  thereby  resulting  in  a  different 
tvi  slope. 

We  are  now  also  comparing  vernier  acuity  and  grating  detection  threholds  in  the  presence  of 
oriented  masking  gratings.  The  masks  have  Uu'ge  effects  at  orientations  somewhat  different  from  the 
vernier  tiu-get  grating.  This  masking  effect  is  greatly  reduced  in  the  grating  detection  task.  In 
summary,  vernier  acuity  for  the  most  pan  is  well  predicted  by  contrast  discrimination.  Future  models 
of  vernier  acuity  will  have  to  consider  properties  of  later  stage  of  processing  in  order  to  account  for 
pertbrmance  where  contrast  discrimination  does  not  predict  vernier  acuity.  Such  models  will  likely 
include  multiple  mechanisms  as  different  spatial  scales,  orientations,  and  densities.  We  have  spent 
some  time  working  with  voiious  modeling  tools,  such  as  the  early  vision  emulation  softwtire  (EVE), 
Mathematica  and  Matlab,  but  have  found  each  lacking  in  ease  of  use  or  flexibility.  Our  fnistration  with 
these  tools  along  with  several  other  faculty  members  at  UCB  lead  to  reguhu'  meetings  on  the 
requirements  for  a  powerful  vision  modeling  environment.  Besides  Thom  Ctuney  and  myself, 
participants  in  this  group  include  Jitendra  Malik,  Marty  Banks,  Ted  Cohn  and  Gordon  Legge  (who 
was  on  sabbatical  here).  Having  enumerated  the  required  capabilities  for  such  an  environment  to  be 
useful  to  the  vision  community  at  large,  we  are  now  preparing  a  proposal  for  implementing  a 
comprehensive  vision  modeling  tool.  Such  a  tool  would  prove  invaluable  in  extending  our 
understanding  of  spatial  and  temporal  vision. 


3)  Connecting  psychophysics  to  physiology:  The  visual  evoked  potential.  The  power 
of  the  test-pedestal  approach  for  revealing  underlying  mechtuiisms  lead  us  record  visual  evoked 
potentials  (EP)  to  such  stimuli  with  the  goal  of  cortical  functional  localization.  The  temporally  varying 
test  pattern  was  identical  across  conditions,  the  different  static  pedestal  patterns  detennined  the 
stimulus  condition,  dynamic  vernier,  motion,  contrast  modulation  or  counterphase  sinewave.  This 
approach  has  the  advantage  that  the  test  signal  that  generates  the  EP  was  identical  across  conditions. 
The  results  were  compitable  with  psychophysical  results  described  above  but  functional  loctilization 
was  not  possible  with  the  limited  number  of  recording  channels  available  to  us  (ARVO  1990). 

More  recently  we  teamed  up  with  Anthony  Norcia  and  Peter  Wong  (ARVO  1991)  who  had 
collected  EPs  to  a  variety  of  stimulus  categories,  vernier  jitter,  color,  onset/offset,  and  counterphase 
patterns  using  a  21  electorde  recording  array.  Our  analysis  of  the  data  revealed  some  cortical 
specificity  for  the  type  of  stimuls  pattern.  Unfortunately,  the  stimuli  were  not  .selected  according  to  the 
test-pedestal  ptuadigm  which  complicated  comparisons  across  stimulus  conditions. 
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Based  on  these  experiences  we  have  proposed  to  localize  in  space  and  time  the  multiple 
cortical  sources  of  the  EP  by  combining  the  multi-input  analysis  technique  developed  by  Erich  Sutter 
with  large  EP  recording  arrrays  and  ctu'eful  stimulus  control.  We  expect  to  track  the  processing  of 
cortical  information  in  time  and  time  and  relate  it  to  stimulus  dimensions  such  as  color  motion  and 
form,  thereby  furthering  our  understanding  of  cortical  functional  organization.  In  connection  with  this 
plan  we  have  developed  improved  methods  for  estimating  the  linear  and  nonlinear  kernels  from  multi¬ 
input  white  noise  experiments  (Klein,  1992,  reference  #8). 

Published  and  In  Press  articles.  (January  1,  1990  -  February  4,  1992). 

One  copy  of  articles  1  - 16  has  been  enclosed  with  this  technical  report. 
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